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INTRODUCTION 


N the approach to problems dealing with the promotion of good 
health for the public, we may think of the public as being composed 
of persons in an atomistic sense, or composed of various clumps of per- 
sons, such as families, blocks, communities, and so on. According to the 
nature of the problem, the ultimate unit of study may be the individual 
or it may be some one of these groups. Since in either case the individual 
belongs to the aggregate and the aggregate is composed of individuals, 
there is sometimes confusion as to what the unit of study actually is. 
The term “ family studies ” is apt to be used for any problem where 
the family roster enters the field of observation in any way. In certain 
cases the family is considered in order to provide a more complete under- 
standing of the individual than could be gained by studying him detached 
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from his family group. In other cases the family itself is the primary 
unit of investigation because it is this aggregate, and not the individual 
members, that is relevant to the problem. This contrast is not unique to 
population studies. The chemist is at times interested in the properties 
of water as related to those of the hydrogen and oxygen atoms which 
form it, and at other times he is interested in water as water. 

Both the methodology and interpretation of family studies are depen- 
dent on the extent to which one or the other of the units, the individual 
or the family, dominates the analysis. Even the family of size one will 
present different implications when viewed as a person living in a family 
composed of himself alone or as a family unit classifiable on the same 
axes as any other undivided family unit. Some of the problems and 
values in using the family as a unit of study will be considered in this 
discussion. 


DEFINITION AND CLASSIFICATION OF THE FAMILY 


In approaching the use of the family as a unit we face some funda- 
mental questions of definition of this unit and classification under the 
definition employed. The variety of structural patterns that we recognize 
as families, and the alteration of these patterns with the flow of time, 
make the problem difficult, particularly in follow-up studies such as 
those involved in chronic disease. 

Whenever we struggle with definitions of units in biological and socio- 
logical studies we are apt to admire the physicist’s greener pastures. 
In this connection J. H. Burn’s (1) introduction to units of measurement 
in his Biological Standardization is encouraging. He says “ Every 
pharmacologist should visit Winchester. ... of special interest in the 
tower of Westgate are the examples of the standard yard. These are 
curious because they are not all of the same length, and to some, pieces 
have been added. It appears that the yard used to be the length of the 
king’s arm, and therefore when a new king came to the throne the 
standard was changed.” 

Our problem of family unit has some of the same elements of varia- 
bility as the king’s arm, including the situation that must have been 
embarrassing when the king came to the throne before attaining his full 
growth. 

A natural approach to the definition and classification of the family 
is through extensions of those used for the individual. The definition 
of a person and analyses based on various individual classifications are 
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usually straightforward. Sometimes the classifications are also easy, as 
for example in age, sex, or place of birth, but even for the individual unit 
the relevant measurements and classifications may be troublesome, as in 
the case of nutritional state, size, mental adjustment, or health itself. 

In extending the classifications to the group of persons designated as 
the family, we must recognize that this group is more than a mere col- 
lection of persons. It is a unit in the sense that the philosopher means 
by the term “holism.” This is the concept (2) that “In one degree or 
another, wholes are more than the sum of their parts, and the mechanical 
putting together of their parts will not produce them or account for their 
characters and behavior.” The recognition of this concept for communi- 
ties leads us to classify them not merely in terms of the individuals com- 
posing them but as entities such as metropolitan area, suburban area, 
rural non-farm, and so on. 

This principle is of particular importance when we consider the 
family. A given family is something more than a man of 37, a woman 
of 35, and two children of 6 and 10. There is a great deal that is 
implied in the standard phrase on the census schedule “ head of house- 
hold ” or “ relationship to head.” If this were not so, families could be 
made up by pulling individuals of the designated characteristics at 
random out of the population and assembling them into groups that 
could be considered family units. 


The appropriate definition of a family unit for a given public health 
inquiry depends of course on the nature of the study. For example, the 
definition may be a social one, related to common shelter and provision 
for food regardless of relationship, it may be restricted to blood relations 
who have a common dwelling unit, or to the group covered by a medical 
insurance plan, or, as in genetic studies, to certain blood relations regard- 
less of place of residence or even whether living or dead. 

In spite of these variations, the setting up of the definition of the unit 
pertinent to the problem is usually less troublesome than the development 
of axes of classification of these units and the actual classification itself. 
The job of selecting appropriate measurements and classification for 
individuals is compounded when applied to the family because of the 
variation of the family members. 


This raises the question as to whether the classification of the in- 
dividuals is any real part of the classification of the family as a unit. At 
times it is not. We may allocate the family as to housing, socio-economic 
status, nutrition, type of medical care, even though the members of the 
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family would, if rated, not all fall into the same class as the family as a 
whole. At other times the rating of the family calls for an individual 
counting of noses. The most obvious variable of this type is size of 
family, and others are number of wage earners, rooms per person, and 
so on. 

The fact that the individual members of the family may be used to 
classify families without destroying the integrity of the family as a unit, 
does not mean that they are always used in this way. A problem will 
often start on a family basis, will describe the members of the family and 
become at that point an individual and not a family study at all. This is 
particularly apt to happen in treating age considered as a variable. Since 
the age of the family as a unit is not the same as the age of the individual 
members composing it, the age scale desired needs to be critically selected. 
But even if we want to deal with the ages of the separate members, unless 
these are linked together so as to keep the family intact as a unit, the 
study becomes an individual one and ceases to be a family one in any 
real sense of the word. 


ILLUSTRATIONS OF STUDIES USING THE FAMILY UNIT TO VARIOUS EXTENTS 


In order to distinguish studies using the family as a unit from other 
family studies a few illustrations will be presented in increasing order as 
to the extent to which the family is the real unit of analysis. 

In various types of investigation the family members come under 
observation because of an event which occurs in the family, perhaps to 
one member, which may affect the others. The most familiar example is 
the secondary attack rate in infectious disease. A case of the disease 
occurs in a family and the other members are followed to see how their 
subsequent attack rates compare with those of the general community, or 
better still with those of persons in similar families not having a primary 
case. This study starts with the family, because the primary case brings 
the whole family under observation. But the way in which these people 
are studied is usually to pool all members of such families, classify them 
by age, and determine age specific attack rates, which are then compared 
with those in the community or the control families. The only classifica- 
tion of these individuals on a family basis is as to whether or not they 
have familial association with a case. The family itself is no more a unit 
of study than if we had picked people whose surname began with A and 
compared them with those whose surname began with B. Both the classi- 
fication and analysis are on an individual basis. 
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At times in this problem, the family unit is maintained and further 
classified although the final analysis remains on an individual basis. In 
1939, E. B. Wilson and his associates (3) analyzed measles and scarlet 
fever in Providence, R. I., in this way. Thus they classified families as 
to their size, number of susceptibles, and number of primary cases, and 
computed age-adjusted secondary attack rates for these various types of 
families. The linking of the members to allow this further family 
classification contributed materially to the final interpretation of the 
individual secondary attack rates. 

This study also analyzed the data on a complete family unit basis to 
give family attack rates rather than individual attack rates alone. With 
the families subclassified as to number of susceptibles it was possible to 
take the family of three susceptibles, for example, and determine the 
number having 0, 1, 2, and 3 cases. This distribution could then be com- 
pared with certain epidemic theory yielding the expected distribution of 
such families. Since for some diseases, a community epidemic is largely 
the summation of little family epidemics, the analysis in terms of family 
units goes to the heart of the epidemic problem, and is far more illumi- 
nating than the usual study of individual secondary attack rates where 
the family is only a selecting and primary sorting mechanism. 

A second illustration presented to show the varying extent to which 
the family is taken as the unit, is that of differential fertility. Looking 
first at fertility by age of husband and wife, we have from the 1940 
census cross-tabulations of these ages (4) so that the information on the 
pair is available. From a corresponding tabulation of the ages of 
parents of live births (5), both of the ages being states (which presumably 
eliminates most of the illegitimates), birth rates by age of married 
couples can be calculated. In order to see the effectiveness of this joint 
tabulation we might look first at the birth rates by age of husband and 
wife where these ages are not brought together on a family basis. 

Fig. 1 shows the usual picture of birth rates declining with age of 
husband and wife, the separation of these curves being due to the char- 
acteristic difference in age between husband and wife. However the 
“characteristic ” difference in ages is only the central value in an ex- 
tremely variable marital pattern, and this graph throws no light at all 
on birth rates for various combinations of ages in husbands and wives. 

Fig. 2 shows the birth rates by age of wife separately for selected ages 
of husband, these being certain of the contour lines on a 3-dimensional 
surface. It is apparent from this graph that although young wives have 
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higher birth rates than older wives, the extent of this differential varies 
considerably with the age of husband. In fact women whose husbands 
are over 55 show a relatively small differential with age. Thus an altered 
view of birth rate by age is obtained when the couple is considered 
rather than each member of the pair separately. 

To get at the underlying reasons for these differences calls for a more 
searching inquiry than is ever possible from routinely collected statistics. 
The outstanding investigation of such issues is the Indianapolis study 
on the Social and Psychological Factors Affecting Fertility (6). Up to 
this time 14 papers have appeared. analyzing the records obtained in the 
1941 study of 2500 native-white Protestant couples with unbroken 
marriages, the marriage having occurred between 1927 and 1929. Many 
different issues have been analyzed from the material; I will mention 
only one, that of the relationship of socio-economic group to fertility. 

There have been a wealth of studies showing that the lower the 
economic status, the higher the fertility, some showing a minimum 
fertility at an intermediate economic class. However these studies have 
for the most part been done either on an individual basis (the character- 
istics of the mother or the father) or on very limited sub-classifications 
of the families. 

The Indianapolis study explored a variety of descriptive characteristics 
of families, one of which was the extent to which the couples had children 
according to plan. On this axis, couples were classified in a descending 
order of extent of planning: those who planned both number and spacing 
of children; those who planned the number, in that they deliberately 
planned the last pregnancy ; those who did not deliberately plan the last 
pregnancy but who welcomed it (designated as “ quasi-planned ”) ; and 
those who did little or no effective planning and had more children than 
they wanted (designated “ excess fertility ”). 

Fig. 3 shows the fertility history of the entire group of families ac- 
cording to one of the economic indices used, average annual income 
earned by the husband since marriage. It shows the usual picture of 
highest fertility with lowest income, with some evidence of a minimum 
value in the moderate income group. However when the families are 
sub-classified according to degree of planning there is a somewhat differ- 
ent picture. 

Fig. 4 shows the fertility rates for the families by both income and 
planning group. It is seen here that the common statement that the 
smaller the income, the higher the fertility, in an oversimplification. For 
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the completely planned families the reverse is true. The greater fertility 
goes with greater income. For the partially planned groups there is no 
very definite trend of fertility with income. Only the unplanned group 
shows the commonly accepted relationship. Thus the over-all picture is 
governed by the proportion of families that are unplanned. This finding 
affects the interpretation and would have considerable bearing on any 
program related to the control of fertility. 


DISCUSSION 


In the two illustrations presented where the family rather than the 
individual is the real unit of study, one used a classification of the 
children in the family as to number and susceptibility, the other used a 
classification of the married couple. Neither of these involves the more 
difficult problems where the essential measurements and classifications 
must cope with the complexities of structure and time-change which 
characterize family patterns. Each such problem requires a thorough 
study of the important issues in the particular case in order to make 
even a first approximation to a satisfactory choice of measurements and 
classification of family units. I shall not attempt a discussion of problems 
of this complexity but will merely make certain general remarks about 
them, in part by way of summary. 

The problem is rare where the composition of the family as to its 
specific members can be neglected in the analysis. Events are unlikely to 
be independent of size of the family and ages of its members. These 
factors may be adjusted for, but they must be considered. But to say 
that the individual members usually have to be considered in the family 
classification is not to say that the family has been classified as a unit 
when its members have been described. Our public health problems that 
are related to families as such demand classification and analysis on a 
family unit basis. 

Three such problems may be cited that either have been studied or are 
now under study in our own laboratory. One of these is the development 
of a life table for a family. This was proposed by Lowell J. Reed some 
years ago as one approach to the study of the effect on the family of the 
occurrence of a chronic disease such as tuberculosis. It raises questions 
fundamental to the study of the family as an entity since it requires a 
definition of the birth and death of a family. Does the family die only 
under the circumstances that all of its members die? The fact that most 
of us would answer “no” to that question implies a meaning to the 
family as a unit, and this meaning for a given problem must be delineated. 
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A second problem is one proposed by the Maryland Blue Cross as to 
whether there are some families that are hospitalization-prone, and if so, 
what are their characteristics. Since coverage is on a family basis, three 
annual hospitalizations of a week each for any family are the same 
liability from the Blue Cross standpoint whether three different people 
are involved or the same person is hospitalized three different times. The 
family is therefore the essential unit and its classification presents a real 
problem, particularly because of time changes. 

A third illustration concerns the migration of families. For certain 
follow-up studies families may be sought that can be kept under observa- 
tion for a long time. It therefore is critical to know what types of 
families are likely to move frequently and far and what types stay put 
(7, 8, 9). Measurements describing the family unit are not only necessary 
in order to answer the initial question, but also to understand the way 
in which successfully followed families may differ from those in the 
population as a whole, when it comes to generalizing from the findings 
in a follow-up study. 

All such studies call for imagination and insight to determine the 
measurements on the family which are relevant to the problem. The 
development of such measurements and the classification of families for 
these variables are as necessary a part of the statistical methodology as 
the sampling procedures. In spite of the great interest in family studies 
this aspect in their planning and execution is easy to overlook. We tend 
to fall back on an analysis of individual family members. At times the 
individual unit is really wanted. At other times the individual unit is 
used by default, either because it is not fully appreciated that a family 
unit is required, or because the problem of classifying family units is too 
difficult. Further development of statistical methodology to cope with 
these measurement and classification issues will open up new approaches 
to such public health problems. 
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STATISTICAL ANALYSIS OF THE A-B-O SYSTEM 
IN MIXED POPULATIONS * 


BY W. L. STEVENS 


Faculdade de Ciéncias Econémicas e Administrativas, 
Universidade de Sdo Paulo, Brazil 


SUMMARY 


BSERVATIONS on the genetic frequencies in populations, derived 

from the mixture of two homogeneous populations, provide infor- 
mation on the proportions in which the two original populations con- 
tributed to the mixture. Statistical methods are given, with worked 
examples, for using data on the A-B—O system of blood groups in studies 
of population mixing. The methods are also of wider interest, since 
they solve the linear regression problem in a general form. 


1. INTRODUCTION 


If a mixed population is supposed to have been formed by the 
blending of two homogeneous populations, observations on genetic fre- 
quencies will provide information about the proportions in which the 
two original populations have been mixed. (Bernstein, 1932, Otten- 
sooser, 1944, and Boyd, 1950) 

This paper explains how to undertake a study of genetic frequencies 
in the A—-B—O blood groups, with a view to testing the hypothesis of 
population mixture and of estimating the proportions in which the 
original populations have contributed. 

At the same time, the methods are of general interest. The problem 
of adjusting a regression line, when both variables are subject to “ error,” 
is one which rarely, if ever, receives an adequate treatment in the text- 


* The notation and principles used in this paper are explained in a previous 
paper in this journal (Stevens, 1950), which should accordingly be used for 
reference. 


A-B-O SYSTEM IN MIXED POPULATIONS 13 


books. If, in what follows, we regard p and qg as any pair of variables, 
the methods show how to solve the linear regression problem in its most 
general form, provided the deviations are, with adequate approximation, 
binormally distributed. 

Let us suppose that the original homogeneous populations, Y and Z, 
possess the genes A and B respectively in the proportions: 


Gene Race Y Race Z 
A A A+y 
B e+e 


Then a population formed by mixing Y and Z in the proportions (1 — z) 
of Y with z of Z, will possess the genes A and B respectively in the 
proportions P and Q, where 


The quantity, z, is often called the “degree of mixture.” If the 
proportions for Y, Z and the mixture are plotted on a graph, the point 
representing the mixture will lie on a straight line joining the points 
which represent Y and Z. 

In practice, it will often happen that the research worker has series 
from a number of populations in which the mixing proportions vary. 
Taking into account also the sampling variation, we shall therefore 
expect the points which represent the estimated gene frequencies to lie 
scattered about the line joining Y to Z. The statistical problem is to 
test whether the points diverge from the line more than can be attri- 
buted to random sampling and, if they do not, to estimate z and its 
standard error for each of the mixed populations. 

The foregoing treatment is appropriate, either when the genetic 
proportions in Y and Z are given by hypothesis or when these have 
been estimated from much larger numbers of observations than are 
available for the mixed populations. 

Less restricted cases arise, if the proportions in one or both of the 
original populations are estimated from series comparable in length 
with the others, or if one or both of the original populations has been lost. 

If we can assume that the genetic proportions in population Y are 
respectively A and », but make no assumption in respect of population 
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Z, the genetic proportions in any mixture and in population Z will be 


given by 
P and b(P—A), 


i.e., the points representing the genetic proportions lie on some line 
through Y. There enters the additional problem of estimating ), i.e., 
of determining the slope of the line. 

Finally, making no assumptions at all about the original populations, 
we may ask if the genetic proportions are related by a linear regression 


of the type, 
Q =m -+ bP. 


Here we shall have to estimate both m and b. 


2. METHODS 
2.1 Genetic proportions given for the two original populations 


Here 

yr 

Qmp+ez. (2. 11) 
For each of the mixed series, we first find efficient estimates p and q, of 
the proportions of A and B genes. The density of the sampling dis- 


tribution of p and q, omitting the constant factor, is given approxi- 
mately by 


exp(— + + igadq*) } (2. 12) 


where P, Q is the point on the regression line (2.11) representing the 
true gene proportions, 


Pap 
(2. 13) 
and ipp, tpg, teg are the components of the information matrix for the 
point P,Q (Stevens, 1950, section 1. 2). 
The efficient estimate of x can be found by the method of maximal 


likelihood which, to the present approximation, is equivalent to minimal 
x’. The logarithm of likelihood is 


| 

| 

| 


A-B-O SYSTEM IN MIXED POPULATIONS 15 


L —— }{ire(p ya)? + —A— yz) —@) 
+ teg(q 
Hence, 


(iney + (p—A— ye) + (inay + iaee) ez) = 0, 


and therefore 
Lippy + — A) + (troy + — #) 
ippy*® + 2tpeye + 
where ipp, ipg and igg are the terms of the information matrix for values 


P and Q given by equation (2.11). 
The variance of z is given by 


(2. 14) 


1 
nippy? + + tage) 
where n is the number of observations in the series. The variances and 
covariance of P and Q are 


var (P) = y’o’, 
cov (P,Q) = yes", 


(2. 15) 


var (Q) =o’. (2. 16) 
The value of x? for testing deviations from the line is found from 
x’ = Sw{y(q — —ep—A)}’, (2. 17) 
where S- - - denotes summation over the series, and 


— tro’) 
tppy* + 2ipeye + tage” 
The number of degrees of freedom is equal to the number of series. 


(2. 18) 


2.2 Genetic proportions given for only one of the original populations 
Hence, 

(2. 21) 
The genetic proportion, P, is estimated by the equation 


(ipp + —A) + (ive + toed)(q — 
inp + + iad? 
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where, 


Sw(P —a)(q—»#) 


and 
ind") (2. 24) 
The variance of } is found from 


1 


var (b) = Sw(P—a)*" (2. 25) 
For testing deviations from the line, we use 
x? = Sw{(q—») —b(p—a)}? (2. 26) 
with degrees of freedom one fewer than the number of series. 
2.3 No assumptions about the two original populations 
Here we assume that 
Q=m- dP. (2. 31) 
Then P is estimated from 
p — (ier + irab)p + (ipa + iead)(g—m) 


tpp + 2tpgb + 

where m and b are roots of the pair of simultaneous linear equations, 
m: Sw + b-Swp = Swq 

m:SwP + b-SwPp = SwPq (2. 33) 


and w is given in equation (2. 24). 
The information matrix of (m,b) is 


Sw SwP 
SwP SwP? (3. 58) 


whence, by inversion, we may obtain the covariance matrix of the pair 


of estimates (m, b). 
For testing deviations from the regression line, we use 


x? = Sw(q—- m — bp)’, (2. 35) 


with degrees of freedom two fewer than the number of series. 
We also have expressions for the variances and covariance of P and 
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Q, for cases (2.2) and (2.3), but, as they are rather heavy and will 
not often be needed, they are not given here. 


3. APPLICATION 
3.1 Data and estimation of genetic proportions 


It is evident that the above equations do not yield explicit solutions. 
A process of successive approximations will therefore be needed. This 
process will be illustrated for cases (2.1) and (2.3): there should be 
no difficulty in seeing how to adapt it to the intermediate case (2.2). 

The data were obtained by the late Dr. E. M. da Silva, in the interior 
of Brazil (Mato Grosso). Five series (I-V) were from populations 
believed to be derived from mixtures of Indian and white blood. The 
sixth (L. W.) is described as “ local whites,” i. e., nominally white people 
living in the same region. 

One must, of course, be cautious in applying large-sample theory to 
such small series as these, although we do not think that any substan- 
tially wrong conclusion will emerge. In any event, it is the best we 
can de: there is no prospect of constructing exact small-sample tests for 
this problem. It may be noted that the square-root transformation device, 
proposed in Stevens (1950), is of no help here, because it would convert 
the simple linear regression-into an unmanageable elliptical regression. 

The data were tested for agreement with the genetic hypothesis and 
the genetic proportions, p and q, were estimated by Bernstein’s method. 
Data and estimates are shown in Table 1. 


TABLE 1 
SERIES A AB n q 

I 133 59 22 3 217 0.1551 0.0594 
II 74 826 5 1 106 0.1366 0.0287 
Ill 38 «12 3 5 58 0.1560 0.0701 
IV 53 5 5 0 63 0.0405 0.0405 
Vv 38 4 0 0 42 0.0488 0 

L. W. 94 53 18 7 172 0.1923 0.0752 


3.2 Genetic proportions assumed for the Indian and white populations 


Following Dr. Ottensooser (1944), we work with the hypothesis that 
the original Indian population possessed neither the A nor the B gene 
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and that a typical white population has 0.27 of gene A and 0.08 of 
gene B, 
A=0 A+ y= 0.27, 


p=0 p+e—0.08. 
Hence 
y = 0.27, and 


« = 0.08. 


The points, representing the pairs of estimates (p,q), are plotted on 
the graph shown in the figure. The assumed regression is the straight 
line joining to (A+ 6), i-e., the point (0,0) to the point 
(0.27), 0.08). 
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Fie. 1. EstTIMaTes (p,q), REGRESSION LINE, CONJUGATE DIRECTION, AND 
EsTIMATEs (P,Q) 


We now seek estimates of P and Q, i.e., of points on the regression 
line, from which the observed points (p,q) are presumed to diverge in 
consequence of the errors of random sampling. The preliminary esti- 
mates can conveniently be obtained by a graphical construction. 

Choose a typical pair of values (p,q) lying somewhere near the 
middle of the picture, say 


p—0.15, q = 0.05. 


The terms of the corresponding information matrix, calculated from 
the tables in Stevens (1945 or 1950), are 


inp 15, ing = 8, = 41. 
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We now introduce what we shall call the “direction conjugate to the 
regression line ” or, more simply, the “ conjugate direction.” + To deter- 
mine this direction we find the tangent of the angle which the direction 
forms with the horizontal, by the expression, 


(trey + tage) 3.82 
TABLE 2 
A=0 u=0 0.27 e= 0.028 
PRELIMINARY ESTIMATES © INFORMATION MATRIX 
SERIES P Q ipp ipg ippyt+ipge 
I 0.163 0.049 135 22 423 3.821 3.970 
II 0.128 0.038 16.8 2.2 54.2 4.712 4.930 
Ill 0.173 0.051 12.8 23 40.8 3.640 3.885 
IV 0.061 0.018 33.9 2.1 112.2 9.321 9.543 
Vv 0.038 0.011 53.7 2.0 182.7 14.659 15.156 
L. W. 0.205 0.061 10.9 2.3 34.4 3.127 3.373 
SERIES (p—A) (q—h) C D o ate a’ 
I 0.1551 0.0594 0.8285 1.3493 0.00342 0.614+0.058 0.60 


II 0.1366 0.0287 0.7852 1.6666 0.00566 0.47140.075 0.47 
Ill 0.1560 0.0701 0.8402 1.2936 0.01333 0.650+0.115 0.65 
IV 0.0405 0.0405 0.7640 3.2801 0.00484 0.233+0.070 0.23 
Vv 0.0488 0 0.7154 5.1704 0.00460 0.138+0.068 0.14 
L.W. 0.1923 0.0752 0.8550 1.1141 0.00522 0.767+0.072 0.77 


(q—#)¥ 
SERIES w —(p—A)e PROBABILITY 
«104 x10? 
I 9.08 +0.363 1.20 
II 5.76 —0.318 0.58 
Ill 2.32 +0.645 0.97 
IV 7.30 +0.770 4.33 4% 
7.97 —0.390 1.21 
L. W. 5.71 +0.492 1.38 
Total 9.67 >10% 


* Readers interested in the geometrical interpretation will notice that the 
regression line and the “ conjugate direction” are parallel to conjugate axes of 
the standard ellipse. Hence the term. 
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Hence, if we construct a right-angled triangle on a base of ten units 
and with a height of eleven units, the hypotenuse shows the conjugate 
direction. The negative sign indicates that the direction slopes down- 
wards to the right: had the sign been positive, the direction would have 
been constructed to slope upwards. 

Finally, we draw through each plotted point a line parallel to the 
conjugate direction to intersect the regression line in the required point 
(P,Q). 

Using these pairs of values as preliminary estimates, the rest of the 
computation, shown in Table 2, follows in a straightforward manner. 
In order that the model may serve in general cases, we have given columns 
headed (p A) and (q although in our particular problem (where 
A =p» = 0) these values are identical with p and q, respectively. Note 
that 


C = (ippy + tpge) (p — A) + (ivey + — 
= (3.821)(0.1551) + (3.970)(0.0594) 
= 0.8285, etc., 
D = (tippy + trge)y + (trey + tegede 
— (3.821)(0.27) + (3.970)(0.08) 
= 1.3493, etc., 


o? = 1/nD, 
t= (C/D. 
The values of z can also be obtained from the graph, using 
«= (P—d)/y. 


These are shown under 2’ in the table. They are remarkably close to 
the revised values, but it must be stated that this example happens to 
be unusually favourable to the graphical method, owing to the fact that 
the shape and orientation of the standard ellipse remains fairly constant 
along any line passing through the origin. 

Finally, for computing x’, we use 


n(ippigag — 


D > 
x? = Sw{(q—n)y — (p— 


A-B-O SYSTEM IN MIXED POPULATIONS 21 


Each contribution to x? is itself a x? with one degree of freedom. 
The overall x’, with six degrees of freedom, cannot be considered as 
indicating significant departure of the data from the hypothesis pro- 
posed, although it is a little larger than one likes. The principal 
divergence is in series IV. Taken by itself, this would be considered 
significant but not when selected for being the most discrepant among six. 
The graphical method described above is perhaps unnecessarily 
elaborate. There would be no objection to a research worker, with no 
taste for graphical constructions, finding preliminary estimates by 
choosing the points on the regression line nearest to the plotted points. 
Or indeed, dispensing entirely with the graph, he could use P = p and 
Q—=p+e(p—A)/y. It may happen however that, with less accurate 
preliminary estimates, it becomes advisable to repeat the process of 
approximation. The same will be true if the shape and orientation of 
the standard ellipse varies considerably over the field. From the values 
obtained for z, we find P=A+ yr and Q=—y+e. Using these as 
new preliminary estimates, the whole process can then be repeated. 


3.3 Problem: are the “local whites” really white? 
For the series L. W., we have found 
xr = 0.767 + 0.072. 
The suggestion is that r—1. We find 


1 — 0.767 
0.072 


The divergence is over three times its standard error. Unless we are 
prepared to revise our ideas of the genetic proportions in a typical white 
population, any claim that these people are of pure white “blood ” can 
be confidently rejected. 

Notice that we might have tested the divergence from the hypo- 
thetical proportions (0.27,0.08) by means of a x? of two degrees of 
freedom, as in section 3.2 of Stevens (1950). The present test is 
however more sensitive, because it responds specifically to the effect of 
mixture with a race without A or B. 


= 3.2. 


3.4 Problem: is there any statistical objection to combining series I, 
II and IIT? 


Using n, the numbers of observations as weights, we calculate the 
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weighted mean, Z, and the weighted sum of squares S,, of deviations 
from the mean of the three values of z: 


= = 0.580, 


1.79. 


Then, taking 
yi = 0.157, 


Q—p+ et 0.046, 


we have the information matrix, 


inp 13.9, ing igg 45.0, 
whence 
D =1.40, 
x? = D822 = (1.40) (1.79) 
= 2.51. 


With two degrees of freedom, this is not significant. There is therefore 
no statistical objection to pooling these three series. Again, this test 
is more sensitive than the more general test of section 3.41 of Stevens 
(1950). The proportion of white “blood” in the combined populations 
is estimated at Z 0.580. A slightly more accurate estimate would be 
obtained by adding the phenotypic frequencies and estimating z directly 
from the pooled data. 


3.5 No assumption made about the genetic proportions in the Indian 
and white populations 


Instead of postulating the genetic composition of the original pure 
Indian and white populations, we may like to ask: can we suppose that 
these series came from populations produced by blending, in varying 
proportions, two homogeneous populations of unknown composition ? 

If so, we have 


Q =m -+ bP. 


The preliminary regression line may be adjusted, by eye, to pass as well 
as possible through the plotted points (p,q). We found 


m= 0, 


b= 0.38. 
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ns The tangent of the angle of slope, for the conjugate direction, is 


_ipp t+ irgdb 15+ (2)(0.38) gg 
ipa + toad 2+ (41) (0.38) 


TABLE 3 


PRELIMINARY ESTIMATES INFORMATION MATRIX 


SERIES P Q tp trotted 
I 0.155 0.059 142 23 35.4 15.074 15.752 
Il 0.118 0.045 181 22 458 18.936 19.604 
Ill 0.164 0.062 13.5 23 33.7 14.374 15.106 
IV 0.060 0.023 34.5 21 88.3 35.298 35.654 
V 0.034 0.013 60.0 21 155.4 60.798 61.152 
L. W. 0.194 0.074 116 23 28.6 12.474 13.168 
SERIES q-——m Cc D P wp 
ore I 0.1551 0.0594 3.2736 21.060 5125 0.1554 796 
' II 0.1366 0.0287 3.1493 26.386 3311 0.1194 395 
= III 0.1560 0.0701 3.3013 20.114 1297 0.1641 213 
‘ns IV 0.0405 0.0405 2.8736 48.847 3923 0.0588 231 
ns V 0.0488 0 2.9669 84.036 4658 0.0353 164 
be L.W. 0.1923 0.0752 3.3890 17.478 3213 0.1939 623 
tly 
: Sw = 21527 Swp = 2454 Swq = 890.9 
: SwP = 2422 SwPp = 347.8 SwPq = 129.76 
an 
Hence 21527m + 24546 = 890.9 
2422m + 347.8b = 129.76 
= giving m = —0.0056 
iat b = +0.4118. 
ng 
SERIES w q—m—bp x? 
I 5125 +0.0011 0.01 
II 3311 —0.0220 1.60 
Ill 1297 +0.0115 0.17 
IV 3923 +0.0294 3.39 
4658 —0.0145 0.98 
i L.W 3213 +0.0016 0.01 
Total 6.16 
| 
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The preliminary estimates, P and Q, may then be found graphically, as 
in the earlier example. The rest of the computation is shown in 
Table 3, where we note that 


C = (ipp + trgd)p + (ive + teed)(q — m), 
D = (ipp + + (ive + teed)d, 

w = n(ippigg — trg*)/D, 

C/D, 

x” = Sw(q — m — bp)’. 


The m and b in this last expression are the revised, not the preliminary, 
estimates. 

The x? of 6.16 with 4(—6—2) degrees of freedom corresponds to 
a level of significance of about 20 per cent. The answer to the question 
put above is therefore, “ Yes.” 

In this example, the solution converges rather slowly. The trouble 
arises principally from series V, where the gene B is absent. A small 
change in the preliminary estimates, P and Q, for this series provokes 
a big change in the information matrix. But it did not seem worth 
while to repeat the process of approximation: however closely we con- 
verge onto the solution, the x? must always be treated with caution when 
the numbers of A or B genes are so small. 
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STUDIES OF INTERRACIAL CROSSING 


I. SPECTROPHOTOMETRIC MEASUREMENTS OF 
SKIN COLOR 


BY R. RUGGLES GATES 
Department of Anthropology, Harvard University 


HE study of skin color and its inheritance has long been beset with 
difficulties because of the lack of accurate standards of comparison. 
The von Luschan color blocks are unsatisfactory and cannot be used in 
scoring the skin colors in families of mixed descent. Many years ago, 
Davenport (1913) used the toy color top as a means of determining 
the amounts of black, red, yellow, and white in human skins, and this 
method is still occasionally used. But even as early as 1912 it was 
shown by Todd and Van Gorder to be inaccurate, partly because the 
red disc contains a considerable amount of black. When correction was 
made for this, Davenport’s curve for skin color in mixed families lost 
its bimodal character. 

Davenport’s two-factor hypothesis of skin color inheritance has thus 
been in need of confirmation for many years. Davenport himself said 
(1913) that the Negro may have “a myriad” of skin color factors. 
From observations (Gates, 1928) of Indian-white crosses in Northern 
Canada, the conclusion was reached that one of the skin color factors 
also affects eye color, and from later (unpublished) studies of these and 
other crosses it was concluded that at least three factors (genes) are 
necessary to account for the Indian skin color. At the same time, 
striking instances of segregation in skin color in crosses of Indians and 
Eskimos with whites were recorded (Gates, 1929). 

More recently (Gates, 1949), a colored chart of skin colors of a 
selected series of eight individuals was published, from persons of 
black X white ancestry. The individuals chosen were equally spaced, 
as far as possible, between pure black and pure white skin color. At 
the same time, a three-factor hypothesis of skin color inheritance was 
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developed, which can now be tested by the use of the color chart frontis- 
piece, in any part of the world where skin color segregation in racial 
crosses is taking place. The next step was to apply the spectrophoto- 
meter, which gives a characteristic reflectance curve for each shade of 
color. 

SPECTROPHOTOMETER REFLECTANCE CURVES 


The first step in this procedure was to obtain reflectance curves of 
the original paintings on canvas in comparison with those of the colors 
as reproduced on paper in Gates (1949). These comparisons were 
kindly made at first on the photo-electric recording spectrophotometer 
of the International Printing Ink Company, Cambridge, Mass. There 
are certain differences between this instrument and the one at the Color 
Measurement Laboratory of the Massachusetts Institute of Technology, 
the latter giving a wider spread of the spectrum. For the sake of 
accurate comparison, Figs. 1, 2, 9 and 10 were therefore taken on the 
M. I. T. instrument, Figs. 3 through 8 on that of the International Ink 
Company. 

Fig. 1 shows the reflectance curves, in visible light, of the nine 
squares as reproduced in ink (the frontispiece). No. 1, representing a 
practically full black skin, shows the least reflectance in all parts of the 
spectrum. No. 3 is decidedly more reddish than Nos. 1 and 2, as the 
curves show. Nos. 4 and 5 are browns, their curves closely parallel 
throughout, but a little closer together in the green. No. 6 is a lighter 
shade of brown and No. 7 is decidedly yellow. These two curves diverge 
more widely in the yellow and red. Nos. 8 and 9 should be approximately 
alike. No. 8 represents a white-skinned segregate descended from a cross 
between the white and black races. No. 9 represents the skin of a white 
man. They are nearly parallel but quite widely apart because No. 9 is 
too light, i.e., through a mistake in matching the colors, it is lighter 
than the original painting on canvas which it is supposed to represent. 

In Fig. 2 the instrument was set so that the ordinates of the curves 
for Nos. 1, 2 and 3 are multiplied by a factor of 5, so as to make the 
differences between them more obvious. It will be seen that Nos. 1 and 
2 cross in the green and Nos. 1 and 3 in the blue. In Figs. 3-6 the 
reflectance curves for the original paintings in oils on canvas are com- 
pared respectively with the imitations of those colors in inks on paper. 
The divergences are generally small (again with the exception of No. 9, 
Fig. 6, which is too light). Although as nearly alike as the eye could 
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determine, Nos. 1-8 show certain divergences. In Nos. 1-3 the curves 
cross each other twice, once in the blue and again in the red, but the 
ink in general gives somewhat less reflectance than the painting. Nos. 
4-6 approach each other or touch in the blue, and cross in the red. 
Nos. 7-8 (Figs. 5 and 6) approach in the blue and nearly meet in the 
red. In general, while the ink reproduction on paper gives a little 
more reflectance than the artist’s colors on canvas, the two approach to 
practical identity at about 500 mp and 700 my wave-length. The 
differences are small and they are consistent, indicating in the darker 
shades more absorption at the violet end and less at the red end of the 


spectrum. 
50 


400 450 500 350 00 700 


Fie. 7. REFLECTANCE CURVES FoR Two COLORED MEN 
Wuo Were RESPECTIVELY #5-6 AND #5 IN THE 
CoLor CHART (FRONTISPIECE) 


The next step was to score certain colored individuals on the ink 
color-chart and then obtain their reflectance curves on the spectrophoto- 
meter. Fig. 7 is the reflectance curve of a colored man whose skin 
color (malar region, above the shaving area) was between 5 and 6 on 
the color-chart. His skin was sufficiently unpigmented so that the dips 
in the curve between 520 and 580 mp, due to oxyhemoglobin, are 
indicated. Fig. 7 also represents another man, who was scored as No. 5 
on the color-chart. The curve is closely parallel but a little “ darker ” 
throughout, and the dips for hemoglobin are more obscure owing to the 
slightly larger amount of melanin in the skin. The graphs in Figs. 
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3-8 were all taken on the spectrophotometer of the International Ink 
Company, Cambridge, Massachusetts. 

Fig. 8 gives the reflectance curves for the author’s cheek on June 9 
and on September 21, 1949, showing no tanning effect. Edwards and 
Duntley (1939b) studied the effects of a one-hour exposure of a white 
skin to the mid-day summer sun, showing the hyperemia after eleven 
hours and the increased melanism up to nine months. As is well known, 
human skins differ greatly in tannability, and this may also be inherited. 

In Fig. 9 are shown the reflectance curves of four other colored men 


500 550 600 650 700 


Fie. 8. COMPARISON OF THE SAME WHITE SKIN (CHEEK) 
IN JUNE AND SEPTEMBER, 1949, SHOWING AB- 
SENCE OF TANNING EFFECT 


who were scored on the color-chart as Nos. 1-2, 5, 4 and 6-7 respectively 
(from below upward). They were then taken to the spectrophotometer 
at M.I.T. and their reflectance curves correspond, only the lightest 
showing an indication of the oxyhemoglobin dips between 520 and 
580 mp. The man classed as No. 5 in the color-chart showed a little 
more absorption in the red end of the spectrum than the one classed 
as No. 4. The one scored as 1-2 was almost full black, while 6-7 was a 
colored man with markedly light skin color. 

In Fig. 10 are shown the reflectance curves for four Chinese, on the 
M.I.T. instrument. Three of them (solid lines) were brothers and 
their curves are nearly parallel throughout. The fourth (broken line) 
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Fie. 9. REFLECTANCE CURVES FoR Four OTHER COLORED 


MEN 
Two of them (#1-2 and 6-7) fell between numbers 
on the color chart. 


Fie. 10. REFLECTANCE CURVES (CHEEK) OF FouR CHINESE 


The solid lines represent brothers, the broken 
line an unrelated man. 
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was a little darker, but all were nearest No. 8 (white segregate) in the 
color chart. The three brothers all had straight black hair and “ black ” 
eyes, with the eyefold. The lowermost of the three (Fig. 10) had very 
oblique eyes with the eyefold less marked. The fourth (unrelated) man 
had somewhat kinky hair, indicating a Negroid ancestor, and brown eyes, 
but a skin color scarcely darker than the three brothers. I am indebted 
to Dr. E. M. Hartl for aid in getting in touch with eight of the Negroes 
and Chinese whose curves are here recorded. 


DISCUSSION 


In the paper of Edwards and Duntley (1939a), they made numerous 
comparisons of the reflectance curves from different parts of the human 
body in white males and females, with a few comparisons of Japanese 
and mulattos. Using similar methods, they found (1949) that in 
women the skin reacts as a whole to female sex hormones. Following 
ovariectomy or castration, the sexual differences in skin color of Cau- 
casians remain, indicating that these differences are genically determined 
and not entirely under hormone control. 


The present paper aims, by use of the spectrophotometer, to make | 


more precise the analysis of the units involved in the inheritance of 
skin color in racial crosses. Having obtained a series of skin colors 
of individual colored persons more or less equally spaced from pure 
black to pure white skin, this chart is used for scoring the skin color 
of other segregates and then comparing their spectrophotometer reflec- 
tance curves. The original color chart (in Gates, 1949) is available 
for the classification of families anywhere in the world in which genetic 
segregation for skin color is taking place. The three-gene hypothesis 
already proposed can thus be tested. The results here recorded show 
good agreement between the color squares and the reflectance curves of 
new individuals chosen at random. The number of color shades which 
can be accurately scored will only be determined by further experience. 
As the amount of melanin in the skin of an individual is affected not 
only by the environment (directly by the light) and by the state of 
health of the individual, and as the genetic determination may also be 
quite indirect, it remains to be determined how clearly segregation- 
points can be demonstrated in the melanin series. A general account 
of the physiology, embryology and biochemistry of human skin pig- 
mentation is being published elsewhere. 
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SUMMARY 


This paper represents several new developments in the study of skin 
color inheritance: (1) the skin colors of eight colored individuals, from 
black to white, were painted in oils on canvas by a portrait artist; 
(2) these squares of color were reproduced as a frontispiece (Gates, 1949) 
on paper, by the mixing of inks; (3) in the present communication (Figs. 
3-6) the reflectance curves for each block of color on canvas are com- 
pared with the curve for the corresponding block as reproduced in ink, 
and shown to be closely similar (except the last, No. 9 in the series, as 
explained in the text) ; (4) six colored men, ranging from dark to light, 
were scored on the color chart and their reflectance curves (from the 
cheek) were then recorded on the spectrophotometer at M. I. T. (Figs. 
7%, 9), thus showing that the color blocks as reproduced in the frontis- 
piece can be used to determine how segregation of skin color occurs in 
mixed families in any country; and (5) the corresponding reflectance 
curves are given (Fig. 10) for four Chinese, three of whom were 
brothers. The hemoglobin curve of these four is clear and the yellowish 
skin color may be due entirely to dilute melanin. 

Copies of the frontispiece of color blocks may be obtained from the 
publishers, The Blakiston Co., Philadelphia, for a nominal sum. The 
blocks can then be cut apart for use, so that the eye will not be affected 
by adjacent colors. 
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STUDIES IN THE HUMAN SEX RATIO 
2. THE PROPORTION OF UNISEXUAL SIBSHIPS * 


BY MARIANNE E. BERNSTEIN 
Fulbright Fellow 


sexually reproducing organisms we assume that during normal 
meiosis the two kinds of gametes of the heterogametic sex are pro- 
duced in equal number. That this is true by and large is borne out by 
the fact that, in general, male and female offspring are produced in 
equal or almost equal numbers (105.5 males to 100 females in man, 
and 100 males to 100 females in Drosophila obscura). However, in the 
drosophila fly a whole series of genetic conditions have been discovered 
in which the sex ratio was found to vary from almost all male to almost 
all female offspring. For instance, Gershenson (1928) discovered certain 
genetic forms of D. obscura where males of this strain produced about 
96 per cent female offspring. Sturtevant and Dobzhansky (1936) studied 
a similar condition in a mutant form of D. pseudo-obscura and showed 
that this abnormal sex ratio was due to the action of a single gene on 
the X chromosome. This gene causes equatorial division of the X 
chromosome bearing it during both meiotic divisions and causes degen- 
eration of most of the Y chromosomes. Thus almost all sperms carry 
an X chromosome and are therefore female reproducing. 

The question is raised almost at once whether similar genetic varia- 
tions of meiosis occur in man and cause an excess of unisexual sibships. 
Such a statistically significant excess of unisexual sibships was apparently 
found by Geissler (1889) on a very large set of data of families reporting 
births in the kingdom of Saxony in the years 1876 to 1885. This study 
of Geissler has been subject to much discussion and criticism. Using 
more refined statistical methods than Geissler, Gini (1908) has shown 
on Geissler’s data and also on data from the city of Dresden and from 


* This manuscript forms Part II of an accepted Ph. D. thesis at the Istituto di 
Statistiche, Rome, Italy. 
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France that there is a surplus of families with children of only one sex, 
predominantly of one sex, and of families where all but the last child 
are of one sex. 

In two American studies, different results were obtained. In 1937 
Rife and Snyder made a house-to-house survey on the distribution of 
sons and daughters for 1269 Ohio families. They found that the devia- 
tions of the observed from the expected proportion of unisexual sibships 
were so small as to be statistically insignificant. Recently Myers (1949) 
using data from Who’s Who in America substantiated the Rife-Snyder 
results. 

This apparent difference between the results of the studies by Geissler 
and Gini on the one hand and the American biometricians on the other 
has frequently been discussed (Gates 1946, Stern 1949), and it was 
stated that probably the amount of data used by Rife-Snyder (and now 
by Myers) was too small to yield the statistically significant results of 
the much larger set of data. However, in the present paper a simpler 
explanation will be given, and the data of Myers will be reanalyzed in 
the light of certain sociological-psychological factors which have been 
overlooked in all previous papers. 

Let p stand for the proportion of male births and q for the propor- 
tion of female births in the data studied. Let & stand for the size of 
sibship and NV; for the total number of families with & offspring. By 
expanding the term N;.(p-+ q)*, Geissler, Rife-Snyder, and Myers have 
found the frequency of families of size k with 0,1,---k sons to be 
expected if no genetic factors operated in favor of unisexual sibships. 
In particular, N;.p* was assumed to be the frequency of families with k 
sons and no daughters and N,g* the frequency of families with k 
daughters and no sons. 

In an earlier paper (1948) this author pointed out that this is not 
entirely correct if we take into account Gini’s (1911) findings that the 
last child has a tendency to balance the sex distribution. Dahlberg 
(1948) queried expectant parents as to whether they wanted their 
unborn child to be a boy or girl. He found that the answer depended 
almost entirely on the sex of their older children. If the older children 
were all or almost all of one sex, then about 90 per cent of the parents 
hoped that their new baby would be of the opposite sex. Myers (1949) 
pointed out that in both the Rife-Snyder and his own data the two- 
child families of one boy and one girl were much more frequent than 
expected, so that in two-child families there is actually a deficiency of 
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unisexual sibships. In Geissler’s data, on the other hand, the excess of 
unisexual sibships is observed in two-child families as well as in all 
larger sibships. The failure to reconcile this difference in two-child 
families accounts for the two different conclusions reached in Europe 
and America, a difference due to the greater use of birth control in 
the twentieth century. 

From what we have said it should be clear that we cannot simply 
compare the sex distribution of families of a given size with theoretical 
frequencies from a binomial distribution. We ought to take account 
of the frequency with which an additional child is born in families of 
different sex composition. A general algebraic development is not diffi- 
cult, but is clumsy if we introduce a symbolism adequate to allow for 
all possible sex configurations. We therefore simply indicate how the 
process starts; the continuation will be obvious. 

Consider a set of families having a first child. Of these there will 
be Np families with a boy and Nq families with a girl. But not ali 
first children are followed by second children. Of the Np families with 
a boy, only a fraction, say Dm, will have a second child; and of the Nq 
families with a girl, a fraction, say D;, will have a second child. It 
may or may not be true that D,, and D; are equal. Now in the absence 
of genetic or other complications affecting the probability of a male 
birth, we shall have, after all second children are born: 


One-child families : One boy Np(1— Dn) 
One girl Nq(1—D,;) 
Two (or more)-child Two boys Np*?Dm 
families : Boy-girl NpqDn 
Girl-boy NaqpD; 
Two girls Nq?D; 


Similar expressions may be written for third children, introducing 
additional D’s, Dim, Dmy, Dim, Dyy for the fraction of families having a 
third child following the indicated configuration of first two children. 
We shall have, after all third children are born: 


Two-child families (completed) : 
MM Np? Dn (1 — Dinm) 
MF NpqDn(1— 
FM NapD,;(1 — Dym) 
FF Nq@D;(1 — Dy) 
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Three (or more)-child families: 
MMM Np? DnDinm 
MMF N mm 
ete. 
FFF 


From the above it is evident that if Dinm, Dyy are larger than Dm, 
Dim, there will be a deficiency of unisexual sibships in completed two- 
child families, as compared with the binomial expectation. 

In order to test this theory we used the same data as were used by 
Myers (1949) though we found it necessary to increase the sample size 
from the 999 families used by Myers to 7616 families in order to get 
statistically significant results. Also, in order to get reliable estimates 
of the various D’s, we confined ourselves to marriages started between 
1910 and 1935 and hoped thus to have only completed families and not 
too many families with deceased children. These restrictions are dictated 
by common sense and can in no way account for the different results 
obtained by Myers and this author. 

As a matter of fact, if we do not correct for the various D’s, our data 
are very similar to those of Myers. Table 1 shows the number observed 


TABLE 1 


Sex combinations in two-child families 


NUMBER BINOMIAL BINOMIAL 

TYPE OBSERVED FORMULA EXPECTATION 
Two boys 798 3003p? 826.43 
One boy, one girl 1 555 3003 (2) pq 1 497.89 
Two girls 650 3003q? 678.68 
Total 3 003 3 003.00 


for each sex combination in our two-child families together with those 
to be expected from the expansion of the term N.(p + q)*. We observe 
the same deficiency of two-child unisexual sibships in the data of Table 1 
as was observed by Myers.* 


+The question arises as to the accuracy of the listings in Who’s Who in 
America. For our treatment of the data it is most important that the birth order 
of the children be correctly given. Unintentional inaccuracies will of course 
balance themselves and have no effect on the results. The most likely intentional 
inaccuracy would be the one where the father of many daughters and “ finally” 
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The simplest way of overcoming the effect of the various D’s on the 
sex combinations in sibships is to compute the different incidences of 
male birth for the k-th child for the different types of families with 
(k—1) children. Making such a computation in Table 2 we see that 
the proportion of males among second children is significantly higher 
in families in which the first child is a boy than in families in which 
the first child is a girl. 


TABLE 2 
Incidence of male birth in second children 
SEX OF NUMBER OF 
FIRST SEX OF SECOND CHILD ONE-CHILD 
CHILD Male Female pos. E. FAMILIES 
Male 1 686 1 444 .5387 +.0089 835 
Female 1 428 1 435 4988+ .0093 788 


The difference of 4.0 per cent in p, has a standard error of 1.3 per cent 
and is therefore statistically highly significant. 

Why then did we find in Table 1 not an excess but rather a deficiency 
of unisexual two-child families? This discrepancy, according to the 
above theory, must be due to the fact that in our data the reproduction 
rate after the birth of the second child is greater in families where the 
first children are of like sex than in families were there is already a son 
and daughter. In other words, in our data Dm», must be larger than Dmy, 
and Dy, must be larger than Dm. We compute these four proportions 
from the data of Table 3. 


TABLE 3 
Sex combinations after the birth of all third children 

SEX OF SEX OF FIRST TWO CHILDREN 
THIRD CHILD MM MF, FM FF 

Male 511 391 

1304 

Female 376 394 

None 799 1568 650 

Total 1686 2872 1435 

Dinm, ete. 526 454 547 


one son would list the son first. This kind of inaccuracy would decrease the pro- 
portion of unisexual sibships; therefore our data may be even better than they 
appear. Incidentally, the charge of careless answers by parents has been leveled 
recently against Geissler’s data also, by H. O. Lancester, Ann. of Eugenics, Vol. 15 
(2), 1950. 
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If (as is nearly true) we consider that Dim == Dy; and that Dny = Dym, 
we have 
Dinm, Dyy = + .0089, 


Dim = .4541 + .0093. 


It is clear that more families have a third child if the first two children 
are of the same sex than if they are of opposite sex. (In this material 
the excess of Dy over Dmm is not significant. There are considerable 
national and social variations in this, however, which will be considered 
in another paper.) 

Considering now the proportion of male third births, we find: 


First two births male, p, = .576 + .017, 
First two births female, ps, = .498 + .018. 


The difference between these is .078 with a standard error of .024, and 
is statistically significant. Numerically it is nearly twice as great as 
the corresponding difference in p, computed earlier. 

In the case of two-child families, it is possible to examine the situa- 
tion in another way, by studying the sex distribution of the first two 
children in families of two or more children. This is possible because, 
as appears from Table 2, there is essentially no difference between Dy» 
and D;. We should therefore expect, if the probability of a male birth 
is the same for all families. to find that the sex combinations of all first 
two children are distributed binomially. The actual distribution is 


given in Table 4. 
TABLE 4 


Sex combinations of all first two children 


NUMBER NUMBER 

TYPE OBSERVED EXPECTED 
Two boys 1686 1626.6 
One boy, one girl 2872 2991.0 
Two girls 1435 1375.4 


It will be observed that there is a substantial excess of families for 
which the first two children are of like sex. Testing by x? we find 
x’ = 9.46, well beyond P = .01, in agreement with our previous results. 

Thus the author concurs with Geissler and Gini in believing that sex 
distribution in human sibships is not at random, for in two-child families 
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and three-child families there is an excess of unisexual sibships. How- 
ever, this finding is not identical with proving that this excess is due to 
a significant proportion of parents whose constitution allows them to 
have children of only one sex. If there is variation in the probability 
of male births from family to family, the same kind of effect will follow. 
Let us assume, for a moment, that our data consist of two distinct 
subpopulations of size N, and N, each with its own p, such that p, ~ po. 
For such a set of data, the theoretical frequency of unisexual two-child 
sibships is not 


but rather 


Since (1) is less than (2) when p, ~ po, it may well be that the excess 
of unisexual sibships observed in the above data is the result of hetero- 
geneity in p among families or groups of families. 

Several factors in favor of this theory have become apparent in this 
and our earlier study: 


(1) The data by Geissler, comprising birth data for the entire 
population of Saxony, are undoubtedly more heterogeneous than the 
present data. In line with this, we find the excess of unisexual sibships 
and of sibships in which there is a marked predominance of one sex, 
greater in Geissler’s than in our data. 


(2) Families collected by us for a previous study (1948) from 
Who is Who in Commerce and Industry are undoubtedly more homo- 
geneous than the present data. These data did not yield any excess of 
unisexual sibships. Still more homogeneous probably are the data used 
by Rife-Snyder which, according to Rife (1949), were all collected in 
one neighborhood of a suburb of Columbus, Ohio. This suburb, according 
to Rife, was almost entirely inhabited by middle class families of Italian 
descent. In this group, sons and daughters were distributed at random. 


(3) Parkes (1921) made an extensive search for families which for 
several generations maintain a very uneven sex ratio. Though he 
searched through many pedigrees, some going back to 1600 A. D., he 
found only six families which through several generations had an unusual 
sex ratio. 
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(4) Lastly, this author believes she has found, in the present and 
newly collected data, evidence of genetic factors which, as part of their 
pleiotropic effect, seem to cause significant variation in the sex ratio of 
the offspring. This was investigated further in another study (1951). 


SUMMARY 


In 1889 Geissler published a large set of data on the distribution of 
boys and girls in human families, which has been subject to much dis- 
cussion. Gini (1908) showed that in Geissler’s families there is a signifi- 
cant excess of unisexual sibships as well as of families in which there 
is a marked predominance of children of one sex. Waaler (1928) 
showed on Geissler’s data that the proportion of male births for the k-th 
child varies significantly for the k types of families with (& — 1) children. 
Research along this line by Rife-Snyder (1937) and Myers (1949) gave 
opposite results: there was a lack of unisexual sibships in two-child 
families and one statistically insignificant in larger sibships. 

In the present paper the author shows how a desire on the part of 
the parents for at least one offspring of either sex, changes the formula 
for the proportion of unisexual sibships in families with k children from 
Ni(p®§ + qt) to N(D,: Des) + 4*)(1— Dx) where stands 
for the proportion of families with i children all of one sex who have 
at least one more child. Using the above and similar formulae on the 
same data as were used by Myers (from Who’s Who in Americ’), we 
demonstrated a statistically significant excess of unisexual sibships for 
data of about 6000 families and showed that the excess of unisexual three- 
child families was also significant for data comprising about 3,000 
families. The author does not believe this excess of unisexual sibships 
to be due to a significant proportion of families who can have children 
of one sex only. Rather the author presents some evidence why she 
believes that the excess of unisexual sibships is caused by the mixing 
in her data of several more homogeneous groups each with its own 
distinct a priori probability of male birth. 
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NOTES ON JAPANESE BIRTH STATISTICS 


BY EIJI TAKAHASHI 


Laboratory of Hygiene and Public Health 
Hirosaki University, Japan 


1. AGE OF MOTHER AND SEX RATIO AT BIRTH 


ARIOUS investigators have considered the relation between age of 

mother and sex ratio at birth (see Lowe and McKeown, 1950, for 
example). All such investigations have dealt with data from Occidental 
vital statistics. The present note presents data from Japanese official 
sources. 

Table 1 gives, for the seven years 1937-43, births by sex, sex ratios, 
and stillbirths by 5-year age-classes of mothers. Fig. 1 shows the varia- 
tion of sex ratio with age of mother. For comparison, the United States 
curve is also shown. 

Certain features of the data call for comment. Up to age 50 the 
data do not differ remarkably from the United States data, or from those 
of other countries. The sex ratio is nearly constant, but with a statisti- 
cally significant decline with increasing age of mother. In the two oldest 
age groups, however, the sex ratio is about 111 for mothers 50-54 and 
rises to 119 for mothers over age 55. It is to be noted that the number 
of births attributed to these mothers is substantial so that the rise e«n- 
not be attributed to sampling error. 

It is perhaps of interest to remark that births to mothers over 55 
have apparently been somewhat of a problem to the United States com- 
pilers of vital statistics. In the three years 1917-19 a total of 163 births 
to mothers 55 or over were reported, in a total of 4 million births. In 
1920 there were 8 such births reported out of 1.5 million births. In 
1921 and subsequent years through 1935, it is stated that verification of 
the age of mother was sought when the age was given as 55 or more. 
During this period only one birth was credited to this age period, al- 
though the original reports had averaged about 17 per year. Since that 
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time the practice of verification seems to have been dropped, and the 
numbers of births to mothers 55 and over has varied from 1 to 67. 

A further peculiarity of the Japanese data is to be noted in the 
behavior of the stillbirth ratio with age of mother. It will be observed 


Sex rutio at birth 
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/ 
/ 
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/ 
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Age of mother 


Fie. 1. Sex Ratio at BirtH, AND AGE OF MOTHER 


that this ratio falls to a low point of 3.68 for mothers 25-29, and rises 
to 7.40 for mothers 45-49, but that it falls to 4.13 for mothers 50-54, and 
to 3.35 to mothers 55 and over. This fall at the older ages is quite unlike 
anything elsewhere observed. 

No completely satisfactory explanation for the reported facts for the 
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mothers over 50 is at hand. It may conceivably represent an actual 
biological difference between Japanese and Occidental women. Or it 
may be that the Japanese data come closer to giving us the biological 
and physiological facts of human reproduction. (We have reason to 
suppose that Japanese women approach closer than Occidental women 
to the physiological limits of reproductive performance.) But it is also 
conceivable that not all these births are correctly attributed to these 
mothers. 


2. SEASONAL VARIATION OF THE SEX RATIO 


The existence of a seasonal variation in the sex ratio of births was 
reported by Diising (3) in Prussia. Later workers, such as Bonnier (4), 


’ 
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Sex ratio 


Fie. 2. Sex Ratio BY MONTH ( AVERAGE, 1908-1937) 


Prinzing (5), Ciocco (6), and Strandskov (7), have failed to find evi- 
dence of seasonal variation. In Japan such a tendency has been found 
by some investigators (Sugiure (8), Aizawa, Okamura, etc.). 

In the present note we examine the Japanese official figures for the 
30 years from 1908 to 1937. The end-points of the period were chosen 
because of the Russo-Japanese war and the Japanese astrological beliefs 
(Hinoeuma), relative to the year 1906, and because of the extension of 
the Sino-Japanese affair in 1938. During this period the absolute number 
of births in every month is between 100,000 and 280,000. 

Fig. 2 shows the sex ratio by months, averaged over the 30 years 
1908-37. The standard error of any single point is of the order of 0.3, 
so that there can be no question of the statistical significance of the 
seasonal variation. The pattern is quite different from anything re- 
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ported from Occidental countries. To be noted are the low values from 
December to March (with an actual excess of females in March) and 
the rise to a peak of 112.5 in November. 

It has been stated by Diising (3) that the sex ratio is lowest in the 
months of highest production of children, and vice-versa. Fig. 3 shows 
the seasonal variation of births (expressed as births per day per thousand 
mean daily births throughout the year). 

It seems true that births are highest in the period from January to 
March, except for the November peak in the sex ratio. It is conceivable 
that the November peak is due to social influence, such as the New Year 
event of the lunar calendar, rather than the natural causes. For this 


Birth ratio 


Dec Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec Jan 
Fie. 3. Ratio By MONTH 


November peak we cannot find any reason because of inexact registra- 
tion, as we find the reason for the large variation of birth ratio between 
December and January. The November rise in the sex ratio is more ap- 
parent in rural districts than in urban districts, and it becomes less from 
about 1937, and after the World War II is no longer found. 

The pattern of seasonal variation of sex ratio at birth of all Japan is 
somewhat different geographically also. Generally in northern parts the 
seasonal variation is smaller than in southern parts,—for instance, in 
Kyiishi where the pattern of all Japan is more exaggerated. 

We have obtained data on hospital births from the clinic of the Tohoku 
National University in Sendai (1922-49), from the Red Cross Hospital 
of Tékyo (1935-49), from the clinic of the Kyishi National University 
in Fukuoka (1940-49), and from the Red Cross Hospital of Kurume 
(1940-49). These data are shown in Fig. 4. 
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The November peak in the sex ratio is not present in these data. 
Indeed, having regard to the variability of the curves, it would be difficult 
to say that there is any convincing evidence for a seasonal component in 
this hospital data, except in Kyishii where the sex ratio at birth in 
April to July is higher than that of births in the other seasons, especially 
in October to February. 
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Fic. 4. Sex Ratio or Hosprrat BirtHs IN THREE DIFFERENT DISTRICTS 


I. Obstetric Clinic of the Tohoku University, Sendai, 1922-1949 
II. Téky6 Maternity Hospital of Red Cross Society, 1935-1949 
III. Obstetric Clinic of the Kyishi University, Fukuoka, and 

Kurume Red Cross Hospital, 1940-1949 


The seasonal variation in number of births is itself of some interest. 
In Occidental cities, such a seasonal variation has almost disappeared 
from the published figures, as was noted by Reed. (10) a quarter of a 
century ago. It is tempting to attempt to explain it in terms of the 
marriage rate, as shown in Fig. 5. We are rather disposed to doubt, how- 
ever, whether the variation in marriage rate is the real explanation, be- 
cause the seasonal variation of second births and of higher order births 
does not differ from that of first births, as shown in Fig. 6 (Dosono’s 
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Birth month 
Dec. Jan. Feb. March Apr May June July Aug Sept Oct. Nov. Dec. Jan. 


Birth 
ratio 


Jon Fob. Mach Moy Sune Aug Sept Oct Wow Dec Jen few 
Marriage month 
RELATION BETWEEN SEASONAL DISTRIBUTION OF MARRIAGES AND BIRTHS 
(Marriages compared with births 11 months later) 
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Birth month 


Fie. 6. Montuiy Dairy Ratio In MATERNITY HOSPITAL 


|_| 
/ 
/ 
Marriage / 
/ 
1000 \ ratio 
\ 47 
\ 
\ / 
Fig. 5. 
_— me birth 
----- Repeated time birth 
1200 
2 
+ 900 \ | 
800 \ 
700 
600 


JAPANESE BIRTH STATISTICS 51 


(11) data). It would appear that one may appropriately speak of a 
human breeding season in Japan, at least over the 30-year period con- 
sidered here. 

One might seek to explain the seasonal variations of the sex ratio 
through seasonal variation in stillbirths. However, though there is a 
statistically significant seasonal variation in stillbirths, as shown in Fig. 
7, it is quite insufficient to account for the seasonal effects in the sex 
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ratio. It is of interest to note that the stillbirth ratio is high when the 
birth rate is low, and vice versa. One might speculate that perhaps some 
of the apparent seasonal variation in the distribution of births arises out 
of incorrect registration of month of birth. 
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A QUANTITATIVE ANALYSIS OF FINGER PRINTS * 


BY DAVID C. RIFE ** 


Department of Zoology 
The Ohio State University 


VER a period of years various types of genetic and anthropometric 

data have been collected from all students enrolled in an elemen- 

tary course in human heredity at the Ohio State University. These data 

include the ABO blood groups, the M and N blood types, taste reaction 

to PTC, stature, weight, head length and breadth, pain threshold, photo- 
graphs, handedness, and palm and finger prints. 

This report has to do with a comparative quantitative analysis of 
the finger print pattern values of the student population, classified as 
to sex, right and left hands, religion, and national origin of ancestors. 
This information was obtained from questionnaire cards filled out by 
the students. They were five categories with respect to religion: Pro- 
testant, Jewish, Catholic, miscellaneous, and none. National origin 
referred to the nations from which the parents, grandparents, or even 
more remote ancestors emigrated to the United States. The prints of a 
population of approximately 1300 individuals, collected by Cromwell and 
Rife (1942) and unclassified as to religion and ethnic origin, are also 
included. 

Finger pattern values were classified according to the system employed 
by Cummins (1943), as follows: arches—0, loops—i, and whorls—2. 
Thus the total pattern value of a hand may range from 0 to 10. The 
prints used in this comparison came only from hands that produced 
legible impressions of all five digits; consequently the numbers of right 
and left hands included in any category were seldom the same. 

Practically all of the persons whose prints were compared are of 


* This paper was presented at the annual meeting of the American Society of 
Physical Anthropoligists in Ann Arbor, Michigan, March, 1951. 
** The Institute of Genetics. 
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European descent. The Protestants comprise by far the largest group, 
and have been classified as to whether of British, northwestern European, 
or mixed British and northwestern European descent. 

The Jews are predominantly of Russian and eastern European 
descent, while the Catholics are almost all of Irish or German origin. 
Those belonging to miscellaneous religious groups or professing no 
religion, as well as southern Europeans, Negroes, and Asiatics, con- 
sisted of too few individuals to be included in the comparisons. 

The writer is indebted to the Statistics Laboratory of the Ohio State 
University for a careful statistical analysis of the data. 


DISCUSSION 


Table 1 shows the mean values, the standard errors, and the number 
of hands in each of the various categories. Comparisons of the totals 
within each of the three religious categories and the unclassified (Crom- 
well’s material) reveal the Jews as possessing significantly higher pattern 
values than any of the others. Catholics have the lowest values, Pro- 
testants and Unclassified being intermediate. Although of lesser mag- 
nitude, the differences between Protestants and Catholics are statistically 
significant. Within the Protestants, British show the lowest, and north- 
western Europeans, the highest values. 

With respect to sex and side, males show significantly higher values 
than do females, and right fingers show significantly higher values than 
do left fingers. With the exception of the Catholics, the sex and side 
differences are consistent within each of the subpopulations. Among 
the Catholics, men have slightly lower values than women, and right 
sides have lower values than left sides. The differences are statistically 
insignificant, and these aberrant trends may be attributed to the relatively 
small size of the Catholic group. 

Males show greater interpopulation variability than do females, and 
right hands are more variable than left. Competent students of dermato- 
glyphics have long been aware of these trends, and the data recorded 
here confirm their observations. The interpopulation differences may be 
attributed to genetic drift, whereas the sex differences strongly suggest 
the existence of sex-linked or sex-influenced modifying genes. The 
reasons for bimanual variations in phenotype are as yet obscure. 
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SUMMARY 


A comparison of the finger pattern values of a population of approxi- 
mately 3600 persons classified according to sex, side, religion, and 
national origin of ancestors revealed the following differences: 


1. The Jews have higher pattern values than do either the Pro- 
testants or the Catholics, whereas the Catholics have the lowest values 
of the three groups. 


2. Among the Protestants, northwestern Europeans show the highest 
pattern values, British the lowest. 


3. The highest variability in pattern values occurs on the right 
hands of males. 
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